Knowledge distillation

A process of reducing model size while maintaining the performance as much as possible. DistilBERT (from BERT) is a famous example.

Raunak2019effective is about Word embedding